The data in internet social media has the characteristics of fast transmission, high user participation and complete coverage compared with traditional media under the background of the rise of various platforms on the internet.There are various topics that people pay attention to and publish comments in, and there may exist deeper and more fine-grained sub-topics in the related information of one topic. A survey of sub-topic detection based on internet social media, as a newly emerging and developing research field, was proposed. The method of obtaining topic and sub-topic information through social media and participating in the discussion is changing people’s lives in an all-round way. However, the technologies in this field are not mature at present, and the researches are still in the initial stage in China. Firstly, the development background and basic concept of the sub-topic detection in internet social media were described. Secondly, the sub-topic detection technologies were divided into seven categories, each of which was introduced, compared and summarized. Thirdly, the methods of sub-topic detection were divided into online and offline methods, and the two methods were compared, then the general technologies and the frequently used technologies of the two methods were listed. Finally, the current shortages and future development trends of this field were summarized.
Because of the complexity of human language, text sentiment classification algorithms mostly have the problem of excessively huge vocabulary due to redundancy. Deep Belief Network (DBN) can solve this problem by learning useful information in the input corpus and its hidden layers. However, DBN is a time-consuming and computationally expensive algorithm for large applications. Aiming at this problem, a semi-supervised sentiment classification algorithm called text sentiment classification algorithm based on Feature Selection and Deep Belief Network (FSDBN) was proposed. Firstly, the feature selection methods including Document Frequency (DF), Information Gain (IG), CHI-square statistics (CHI) and Mutual Information (MI) were used to filter out some irrelevant features to reduce the complexity of vocabulary. Then, the results of feature selection were input into DBN to make the learning phase of DBN more efficient. The proposed algorithm was applied to Chinese and Uygur language. The experimental results on hotel review dataset show that the accuracy of FSDBN is 1.6% higher than that of DBN and the training time of FSDBN halves that of DBN.